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Abstract 
This document recommends methods that are intended to improve the 
scalability and stability of large networks using Open Shortest Path 
First (OSPF) Version 2 protocol. The methods include processing OSPF 
Hellos and Link State Advertisement (LSA) Acknowledgments at a higher 
priority compared to other OSPF packets, and other congestion 


avoidance procedures. 
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Les 


Introduction 


In this document, OSPF refers to OSPFv2 [Refl]. The scalability and 
stability improvement techniques described here may also apply to 
OSPFv3 [Ref2], but that will require further study and operational 
experience. 


A large network running OSPF protocol may occasionally experience the 
simultaneous or near-simultaneous update of a large number of link 
state advertisements, or LSAs. This is particularly true if OSPF 
traffic engineering extension [Ref3] is used that may significantly 
increase the number of LSAs in the network. We call this event an 
LSA storm and it may be initiated by an unscheduled failure or a 
scheduled maintenance event. The failure may be hardware, software, 
or procedural in nature. 


The LSA storm causes high CPU and memory utilization at the router 
causing incoming packets to be delayed or dropped. Delayed 
acknowledgments (beyond the retransmission timer value) result in 
retransmissions, and delayed Hello packets (beyond the router-dead 
interval) result in neighbor adjacencies being declared down. The 
retransmissions and additional LSA originations result in further CPU 
and memory usage, essentially causing a positive feedback loop, 
which, in the extreme case, may drive the network to an unstable 
state. 


The default value of the retransmission timer is 5 seconds and that 
of the router-dead interval is 40 seconds. However, recently there 
has been a lot of interest in significantly reducing OSPF convergence 
time. As part of that plan, much shorter (sub-second) Hello and 
router-dead intervals have been proposed [Ref4]. In such a scenario, 
it will be more likely for Hello packets to be delayed beyond the 
router-dead interval during network congestion caused by an LSA 
storm. 


In order to improve the scalability and stability of networks, we 
recommend steps for prioritizing critical OSPF packets and avoiding 
congestion. The details of the recommendations are given in Section 
2. A simulation study is reported in [Ref13] that quantifies the 
congestion phenomenon and its impact. It also studies several of the 
recommendations and shows that they indeed improve the scalability 
and stability of networks using OSPF protocol. [Ref13] is available 
on request by contacting the editor or one of the authors. 
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Appendix A explains in more detail LSA storm scenarios, their impact, 
and points out a few real-life examples of control-message storms. 
Appendix B provides a list of variables used in the recommendations 
and their example values. Appendix C provides some further 
recommendations and suggestions with similar goals. 


2. Recommendations 


The recommendations below are intended to improve the scalability and 
stability of large networks using OSPF protocol. During periods of 
network congestion, they would reduce retransmissions, avoid an 
adjacency to be declared down due to Hello packets being delayed 
beyond the RouterDeadInterval, and take other congestion avoidance 
steps. The recommendations are unordered except that Recommendation 
2 is to be implemented only if Recommendation 1 is not implemented. 


(1) Classify all OSPF packets in two classes: a "high priority" class 
comprising OSPF Hello packets and Link State Acknowledgement 
packets, and a "low priority" class comprising all other packets. 
The classification is accomplished by examining the OSPF packet 
header. While receiving a packet from a neighbor and while 
transmitting a packet to a neighbor, try to process a "high 
priority" packet ahead of a "low priority" packet. 


The prioritized processing while transmitting may cause OSPF 
packets from a neighbor to be received out of sequence. If 
Cryptographic Authentication (AuType = 2) is used (as specified 
in [Ref1l]), then successive received valid OSPF packets from a 
neighbor need to have a non-decreasing "Cryptographic sequence 
number". To comply with this requirement, we recommend that in 
case Cryptographic Authentication (AuType = 2) is used [Ref1], 
prioritized processing not be done at the transmitter. This will 
avoid packets arriving at the receiver out of sequence. However, 
after security processing at the receiver (including sequence 
number checking) is complete, the OSPF packets may be kept ina 
"high priority" queue or a "low priority" queue based on their 
class and processed accordingly. The benefit of prioritized 
processing is clearly higher in the absence of Cryptographic 
Authentication since in that case prioritization can be 
implemented both at the transmitter and at the receiver. 

However, even with Cryptographic Authentication it will be 
beneficial to have prioritization only at the receiver (following 
security processing). 


(2) If Recommendation 1 cannot be implemented, then reset the 
inactivity timer for an adjacency whenever any OSPF unicast 
packet or any OSPF packet sent to Al1SPFRouters over a point-to- 
point link is received over that adjacency instead of resetting 
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the inactivity timer only on receipt of the Hello packet. So 
OSPF would declare the adjacency to be down only if no OSPF 
unicast packets or no OSPF packets sent to Al1SPFRouters over a 
point-to-point link are received over that adjacency for a period 
equaling or exceeding the RouterDeadInterval. The reason for not 
recommending this proposal in conjunction with Recommendation 1 
is to avoid potential undesirable side effects. One such effect 
is the delay in discovering the down status of an adjacency ina 
case where no high priority Hello packets are being received but 
the inactivity timer is being reset by other stale packets in the 
low priority queue. 


Use an exponential backoff algorithm for determining the value of 
the LSA retransmission interval (RxmtInterval). Let R(i) 
represent the RxmtInterval value used during the i-th 
retransmission of an LSA. Use the following algorithm to compute 
R(i). 


R(itl) = Min(KR(i),Rmax) for i>=1 


where K, Rmin, and Rmax are constants and the function Min(.,.) 
represents the minimum value of its two arguments. Example 
values for K, Rmin, and Rmax may be 2, 5, and 40 seconds, 
respectively. Note that the example value for Rmin, the initial 
retransmission interval, is the same as the sample value of 
RxmtInterval in [Refl]. 


This recommendation is motivated by the observation that during a 
network congestion event caused by control messages, a major 
source for sustaining the congestion is the repeated 
retransmission of LSAs. The use of an exponential backoff 
algorithm for the LSA retransmission interval reduces the rate of 
LSA retransmissions while the network experiences congestion 
(during which it is more likely that multiple retransmissions of 
the same LSA would happen). This in turn helps the network get 
out of the congested state. 


Implicit Congestion Detection and Action Based on That: If there 
is control message congestion at a router, its neighbors do not 
know about that explicitly. However, they can implicitly detect 
it based on the number of unacknowledged LSAs to this router. If 
this number exceeds a certain "high-water mark", then the rate at 
which LSAs are sent to this router should be reduced 
progressively using an exponential backoff mechanism but not 
below a certain minimum rate. At a future time, if the number of 
unacknowledged LSAs to this router falls below a certain "low- 
water mark", then the rate of sending LSAs to this router should 
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be increased progressively, again using an exponential backoff 
mechanism but not above a certain maximum rate. The whole 
algorithm is given below. Note that this algorithm is to be 
applied independently to each neighbor and only for unicast LSAs 
sent to a neighbor or LSAs sent to Al1SPFRouters over a point- 
to-point link. 


Let, 

U(t) = Number of unacknowledged LSAs to neighbor at time t. 

H = A high-water mark (in units of number of unacknowledged 
LSAs). 

L = A low-water mark (in units of number of unacknowledged LSAs). 

G(t) = Gap between sending successive LSAs to neighbor at time t. 

F = The factor by which the above gap is to be increased during 

congestion and decreased after coming out of congestion. 

Minimum time that has to elapse before the existing gap 

is considered for change. 

Gmin = Minimum allowed value of gap. 

Gmax = Maximum allowed value of gap. 


T 


The equation below shows how the gap is to be changed after a 
time T has elapsed since the last change: 


| 

| Min(FG(t),Gmax) if U(t+T) > H 
G(t+T) = | G(t) if H >= U(t+T) >= L 

| Max(G(t)/F,Gmin) if U(t+T) < L 

| 


Min(.,.) and Max(.,.) represent the minimum and maximum values of 
the two arguments, respectively. 


Example values for the various parameters of the algorithm are as 
follows: H = 20, L= 10, F = 2, T = 1 second, Gmin = 20 ms, Gmax 
= 1 second. 


Recommendations 3 and 4 both slow down LSAs to congested 
neighbors based on implicitly detecting the congestion, but they 
have important differences. Recommendation 3 progressively slows 
down successive retransmissions of the same LSA, whereas 
Recommendation 4 progressively slows down all LSAs (new or 
retransmission) to a congested neighbor. 


Throttling Adjacencies to Be Brought Up Simultaneously: If a 
router tries to bring up a large number of adjacencies to its 
neighbors simultaneously, then that may cause severe congestion 
due to database synchronization and LSA flooding activities. It 
is recommended that during such a situation no more than "n" 
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adjacencies should be brought up simultaneously. Once a subset 
of adjacencies has been brought up successfully, newer 
adjacencies may be brought up as long as the number of 
simultaneous adjacencies being brought up does not exceed "n". 
The appropriate value of "n" would depend on the router 
processing power, total bandwidth available for control plane 
traffic, and propagation delay. The value of "n" should be 
configurable. 


In the presence of throttling, an important issue is the order in 
which adjacencies are to be formed. We recommend a First Come 
First Served (FCFS) policy based on the order in which the 
request for adjacency formation arrives. Requests may either be 
from neighbors or self-generated. Among the self-generated 
requests, a priority list may be used to decide the order in 
which the requests are to be made. However, once an adjacency 
formation process starts it is not to be preempted except for 
unusual circumstances such as errors or time-outs. 


In some of the recommendations above, we refer to point-to-point 
links. Those references should also include cases where a broadcast 
network is to be treated as a point-to-point connection from the 
standpoint of IP routing [Ref5] 


Security Considerations 


This memo does not create any new security issues for the OSPF 
protocol. 
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Appendix A. LSA Storm: Causes and Impact 


An LSA storm may be initiated due to many reasons. Here are some 
examples: 


(a) one or more link failures due to fiber cuts, 

(b) one or more router failures for some reason, e.g., software crash 
or some type of disaster (including power outage) in an office 
complex hosting many routers, 


(c) link/router flapping, 


(d) requirement of taking down and later bringing back many routers 
during a software/hardware upgrade, 


(e) near synchronization of the periodic 1800-second LSA refreshes of 
a subset of LSAs, 


(f) refresh of all LSAs in the system during a change in software 
version, 


(g) injecting a large number of external routes to OSPF due to a 
procedural error, 


(h) Router ID changes causing a large number of LSA re-originations 
(possibly LSA purges as well depending on the implementation). 


In addition to the LSAs originated as a direct result of link/router 
failures, there may be other indirect LSAs as well. One example in 
MPLS networks is traffic engineering LSAs [Ref3] originated at other 
links as a result of significant changes in reserved bandwidth. 

These result from rerouting of Label Switched Paths (LSPs) that went 
down during the link/router failure. The LSA storm causes high CPU 
and memory utilization at the router processor causing incoming 
packets to be delayed or dropped. Delayed acknowledgments (beyond 
the retransmission timer value) results in retransmissions, and 
delayed Hello packets (beyond the Router-Dead interval) results in 
links being declared down. A trunk-down event causes router LSA 
origination by its end-point routers. If traffic engineering LSAs 
are used for each link, then that type of LSA would also be 
originated by the end-point routers and potentially elsewhere as well 
due to significant changes in reserved bandwidths at other links 
caused by the failure and reroute of LSPs originally using the failed 
trunk. Eventually, when the link recovers that would also trigger 
additional router LSAs and traffic engineering LSAs. 
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The retransmissions and additional LSA originations result in further 
CPU and memory usage, essentially causing a positive feedback loop. 
We define the LSA storm size as the number of LSAs in the original 
storm, not counting any additional LSAs resulting from the feedback 
loop described above. If the LSA storm is too large, then the 
positive feedback loop mentioned above may be large enough to 
indefinitely sustain a large CPU and memory utilization at many 
routers in the network, thereby driving the network to an unstable 
state. In the past, network outage events have been reported in IP 
and ATM networks using link-state protocols such as OSPF, 
Intermediate System to Intermediate System (IS-IS), Private Network- 
Network Interface (PNNI), or some proprietary variants. See for 
example [Ref6-Ref9]. In many of these examples, large-scale flooding 
of LSAs or other similar control messages (either naturally or 
triggered by some bug or inappropriate procedure) have been partly or 
fully responsible for network instability and outage. 


In [Ref13], a simulation model is used to show that there is a 
certain LSA storm size threshold above which the network may show 
unstable behavior caused by a large number of retransmissions, link 
failures due to missed Hello packets, and subsequent link recoveries. 
It is also shown that the LSA storm size causing instability may be 
substantially increased by providing prioritized treatment to Hello 
and LSA Acknowledgment packets and by using an exponential backoff 
algorithm for determining the LSA retransmission interval. If it is 
not possible to prioritize Hello packets, then resetting the 
inactivity timer on receiving any valid OSPF packets can also provide 


the same benefit. Furthermore, if we prioritize Hello packets, then 
even when the network operates somewhat above the stability 
threshold, links are not declared down due to missed Hellos. This 


implies that even though there is control plane congestion due to 
many retransmissions, the data plane stays up and no new LSAs are 
originated (besides the ones in the original storm and the 
refreshes). These observations support the first three 
recommendations in Section 2. The authors of this document have also 
done simulations to verify that the other recommendations in Section 
2 help avoid congestion and allow a graceful exit from a congested 
state. 


One might argue that the scalability issue of large networks should 
be solved solely by dividing the network hierarchically into multiple 
areas so that flooding of LSAs remains localized within areas. 
However, this approach increases the network management and design 
complexity and may result in less optimal routing between areas. 
Also, Autonomous System External (ASE) LSAs are flooded throughout 
the AS, and it may be a problem if there are large numbers of them. 
Furthermore, a large number of summary LSAs may need to be flooded 
across areas, and their numbers would increase significantly if 
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multiple Area Border Routers are employed for the purpose of 
reliability. Thus, it is important to allow the network to grow 
towards as large a size as possible under a single area. 


The recommendations in the document are synergistic with a broader 
set of scalability and stability improvement proposals. [Ref10] 
proposes flooding overhead reduction in case more than one interface 
goes to the same neighbor. [Ref11] proposes a mechanism for greatly 
reducing LSA refreshes in stable topologies. 


[Ref12] proposes a wide range of congestion control and failure 
recovery mechanisms (some of those ideas are covered in this 
document, but [Ref12] has other ideas not covered here). 


Appendix B. List of Variables and Values 


F = The factor by which the gap between sending successive LSAs to 
a neighbor is to be increased during congestion and decreased 
after coming out of congestion (used in Recommendation 4). 
Example value is 2. 


G(t) = Gap between sending successive LSAs to a neighbor at time t 
(used in Recommendation 4). 


Gmax = Maximum allowed value of gap between sending successive LSAs 
to a neighbor (used in Recommendation 4). Example value is 1 
second. 


Gmin = Minimum allowed value of gap between sending successive LSAs 
to a neighbor (used in Recommendation 4). Example value is 20 
ms. 


H = A high-water mark (in units of number of unacknowledged LSAs). 
Exceeding this mark would trigger a potential increase in the 
gap between sending successive LSAs to a neighbor. (used in 
Recommendation 4). Example value is 20. 


K = A multiplicative constant used in increasing the RxmtInterval 
value used during successive retransmissions of the same LSA 
(used in Recommendation 3). Example value is 2. 


L = A low-water mark (in units of number of unacknowledged LSAs) 
Dropping below this mark would trigger a potential decrease in 
the gap between sending successive LSAs to a neighbor. (used 
in Recommendation 4). Example value is 10. 


n = Upper limit on the number of adjacencies to be brought up 
simultaneously (used in Recommendation 5). 
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R(i) 


= RxmtInterval value used during the i-th retransmission of an 
LSA (used in Recommendation 3). 


Rmax = The maximum allowed value of RxmtInterval (used in 


Recommendation 3). Example value is 40 seconds. 


Rmin = The minimum allowed value of RxmtInterval (used in 


U(t) 


Recommendation 3). Example value is 5 seconds. 


= Minimum time that has to elapse before the existing gap 
between sending successive LSAs to a neighbor is considered 
for change (used in Recommendation 4). Example value is 1 
second. 


= Number of unacknowledged LSAs to a neighbor at time t (used in 
Recommendation 4). 


Appendix C. Other Recommendations and Suggestions 


(1) 


Explicit Marking: In Section 2, we recommended that OSPF packets 
be classified to "high" and "low" priority classes based on 
examining the OSPF packet header. In some cases (particularly in 
the receiver), this examination may be computationally costly. 

An alternative would be the use of different TOS/Precedence field 
settings for the two priority classes. [Ref1] recommends setting 
the TOS field to 0 and the Precedence field to 6 for all OSPF 
packets. We recommend this same setting for the "low" priority 
OSPF packets and a different setting for the "high" priority OSPF 
packets in order to be able to classify them separately without 
having to examine the OSPF packet header. Two examples are given 
below: 


Example 1: For "low" priority packets, set TOS field to 0 and 
Precedence field to 6, and for "high" priority packets 
set TOS field to 4 and Precedence field to 6. 


Example 2: For "low" priority packets, set TOS field to 0 and 
Precedence field to 6, and for "high" priority packets 
set TOS field to 0 and Precedence field to 7. 


Note that the TOS/Precedence bits have been redefined by Diffserv 
(RFC 2474, [Ref14]). Also note that the different TOS/Precedence 
field settings suggested above only need to be agreed among the 
systems on the link. This recommendation is not needed to be 
followed if it is easy to examine the OSPF packet header and 
thereby separately classify "high" and "low" priority packets. 
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(2) Further Prioritization of OSPF Packets: Besides the packets 
designated as "high" priority in Recommendation 1 of Section 2, 
there may be a need for further priority separation among the 
"low" priority OSPF packets. We recommend the use of three 
priority classes: "high", "medium" and "low". While receiving a 
packet from a neighbor and while transmitting a packet to a 
neighbor, try to process a "high priority" packet ahead of 
"medium" and "low" priority packets and a "medium" priority 
packet ahead of "low priority" packets. The "high" priority 
packets are as designated in Recommendation 1 of Section 2. We 
provide below two candidate examples for "medium" priority 
packets. All OSPF packets not designated as "high" or "medium" 
priority are "low" priority. If Cryptographic Authentication 
(AuType = 2) is used (as specified in [Refl]), then prioritized 
treatment is to be provided only at the receiver and after 
security processing, but not at the transmitter since that may 
cause packets to arrive out of sequence and violate the 
requirements of "Autype = 2". 


One example of "medium" priority packet is the Database 
Description (DBD) packet from a slave (during the database 
synchronization process) that is used as an acknowledgment. 


A second example is an LSA carrying intra-area topology change 
information (this may trigger SPF calculation and rerouting of 
Label Switched Paths, so fast processing of this packet may 
improve OSPF/Label Distribution Protocol (LDP) convergence 
times). However, if the processing cost of identifying and 
separately queueing the LSA in this example is deemed to be high, 
then the implementer may decide not to do it. 


(3) Processing a Large Number of LSA Purges: Occasionally, some 
events in the network, such as router ID changes, may result in a 
large number of LSA re-originations and LSA purges. In such a 
scenario, one may consider processing LSAs in different order, 
e.g., processing LSA purges ahead of LSA originations. We, 
however, do not recommend out-of-order LSA processing for several 
reasons. First, detecting the LSA type ahead of queueing may be 
computationally expensive. Out-of-order processing may also 
cause subtle bugs. We do not want to recommend a major change in 
the LSA processing paradigm for a relatively rare event such as 
router ID change. However, a router with a changing ID may flush 
the old LSAs gradually without causing a storm. 
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